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Abstract 

This paper proposes a hierarchical, multi-resolution framework for the identifi- 
cation of model parameters and their spatially variability from noisy measure- 
ments of the response or output. Such parameters are frequently encountered in 
PDE-based models and correspond to quantities such as density or pressure fields, 
elasto-plastic moduli and internal variables in solid mechanics, conductivity fields 
in heat diffusion problems, permeability fields in fluid flow through porous media 
etc. The proposed model has all the advantages of traditional Bayesian formula- 
tions such as the ability to produce measures of confidence for the inferences made 
and providing not only predictive estimates but also quantitative measures of the 
predictive uncertainty. In contrast to existing approaches it utilizes a parsimo- 
nious, non-parametric formulation that favors sparse representations and whose 
complexity can be determined from the data. The proposed framework in non- 
intrusive and makes use of a sequence of forward solvers operating at various 
resolutions. As a result, inexpensive, coarse solvers are used to identify the most 
salient features of the unknown field(s) which are subsequently enriched by in- 
voking solvers operating at finer resolutions. This leads to significant compu- 
tational savings particularly in problems involving computationally demanding 
forward models but also improvements in accuracy. It is based on a novel, adap- 
tive scheme based on Sequential Monte Carlo sampling which is embarrassingly 
parallelizable and circumvents issues with slow mixing encountered in Markov 
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Chain Monte Carlo schemes. The capabilities of the proposed methodology are 
illustrated in problems from nonlinear solid mechanics with special attention to 
cases where the data is contaminated with random noise and the scale of variabil- 
ity of the unknown field is smaller than the scale of the grid where observations 
are collected. 

1 Introduction 

The prodigious advances in computational modeling of physical processes and the development of 
highly non-linear, multiscale and multiphysics models poses several challenges in parameter iden- 
tification. We are frequently using large, forward models which imply a significant computational 
burden, in order to analyze complex phenomena.The extensive use of such models poses several 
challenges in parameter identification as the accuracy of the results provided depends strongly on as- 
signing proper values to the various model parameters. In mechanics of materials, accurate mechan- 
ical property identification can guide damage detection and an informed assessment of the system's 
reliability ( ll37l ). Identifying property-cross correlations can lead to the design of multi-functional 
materials ( ll62l ). In biomechanics, the detection of variations in mechanical properties of human tis- 
sue can reveal the appearance of diseases (arteriosclerosis, malignant tumors) but can also be used 
to assess the effectivity of various treatments (ID [21]). Permeability estimation for soil transport 
processes can assist in detection of contaminants, oil exploration etc. ( Il68ll23l ). 

We consider phenomena described by a set of (coupled) elUptic, parabolic or hyperbolic PDEs and 
associated boundary (and initial) conditions: 

Aiyixyj{x)) = 0, yxeV (1) 

where A denotes the differential operator defined on a domain I? G R'* , where d is the number 
of spatial dimensions. A depends on spatially varying coefficients f{x), x E V. Advances in 
computational mathematics have given rise to several efficient solvers for a wide-range of such 
systems and have revolutionized simulation-based analysis and design ( ll53l ). Our primary interest 
is to identify f{x) from a set of (potentially noisy) measurements of the response = y{xi) at 
a number of distinct locations Xi € 2?. In the case of time-dependent PDEs, the available data 
might also be indexed by time. Several different processes in solid and fluid mechanics, transport 
phenomena, heat diffusion etc fall under this general setting and even though the coefficients f{x) 
have a different physical interpretation, the associated inverse problems exhibit similar mathematical 
characteristics. 

Two basic approaches have been followed in addressing problems of data-driven parametric identi- 
fication. On one hand, deterministic optimization techniques which attempt to minimize the sum of 
the squares of the deviations between model predictions and observations. Gradient or global, in- 
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trusive or non-intrusive techniques are introduced for performing the optimization task. Usually the 
objective function is augmented with regularization terms (e.g. Tikhonov regularization 1591 ) which 
alleviate issues with the ill-posednesss of the problem ( ll60ll27l[T9ll64l l5l l38l ). Such deterministic 
inverse techniques based on exact matching or least-squares optimization, lead to point estimates of 
unknowns without rigorously considering the statistical nature of system uncertainties and without 
providing quantification of the uncertainty in the inverse solution. 

The direct stochastic counterpart of optimization methods involves frequentist approaches based 
on maximum likelihood estimators that aim at maximizing the probability of observations given 
the inverse solution maximum ( 1201 [TSl ). In recent years significant attention has been directed to- 
wards statistical approaches based on the Bayesian paradigm which attempt to calculate a (posterior) 
probability distribution function on the parameters of interest. Bayesian formulations offer several 
advantages as they provide a unified framework for dealing with the uncertainty introduced by the 
incomplete and noisy measurements and assessing quantitatively resulting inferential uncertainties. 
Significant successes have been noted in applications such as medical tomography ( l69l ). geological 
tomography ( l25l l2l). hydrology ( l44l ). petroleum engineering ( l28l l8l). as well as a host of other 
physical, biological, or social systems ( I42ll57ll67ll48l ). 

Identification of spatially varying model parameters poses several modeling and computational is- 
sues. Representations of the parametric fields in existing approaches artificially impose a mini- 
mum length scale of variability usually determined by the discretization size of the governing PDEs 
( l44l ). Furthermore, they are associated with a very large vector of unknowns. Inference in high- 
dimensional spaces using standard optimization or sampling schemes (e.g. Markov Chain Monte 
Carlo (MCMC)), is generally impractical as it requires an exuberant number of calls to the forward 
simulator in order to achieve convergence. Particularly in Bayesian formulations where the infer- 
ence results are much richer and involve a distribution rather than a single value for the parameters 
of interest, the computational effort implied by repeated calls to the forward solver can be enormous 
and constitute the method impractical for realistic applications. These problems are amplified if 
the posterior distribution is multi-modal i.e. several significantly different scenaria are likely given 
the available data. While it is apparent that, computationally inexpensive, coarser scale simulations 
can assist the identification process ( 1T41 ). the critical task of efficiently transferring the informa- 
tion across resolutions still remains ( l50l[3Tll68l ). Previous attempts using parallel tempering (e.g. 
l33l ) or hierarchical representations based on Markov trees ( l65l ) require performing inference on 
representations at various resolutions simultaneously. 

In the present paper we adopt a nonparametric model which is independent of the grid of the forward 
solver and is reminiscent of non-parametric kernel regression methods. The unknown parametric 
field is approximated by a superposition of kernel- type functions centered at various locations. The 
cardinality of the representation, i.e. the number of such kernels, is treated as an unknown to be 
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inferred in the Bayesian formulation. This gives rise to a very flexible model that is able to adapt to 
the problem and the data at hand and find succinct representations of the parametric field of interest. 
Prior information on the scale of variability can be directly introduced in the model. 

Inference is performed using Sequential Monte Carlo samplers. They utilize a set of random sam- 
ples, named particles, which are propagated using simple importance sampling, resampling and 
updating/rejuvenation mechanisms. The algorithm is directly parallelizable as the evolution of each 
particle is by-and-large independent of the rest. The sequence of distributions defined is based on 
using solvers that operate on different resolutions and which successively produce finer discretiza- 
tions. This results in an efficient hierarchical approach that makes use of the results from solvers 
operating at the coarser scales in order to update them based on analyses on a finer scale. The partic- 
ulate approximations produced provide concise representations of the posterior which can be readily 
updated if more data become available or if more accurate solvers are employed. 



2 Problem Definition & Motivation 



In lieu of a formal definition, we discuss an extremely simple problem which nevertheless possesses 
the most important features for the purposes of this work. Consider the steady-state heat equation in 
the unit interval, i.e.: 

where c{x) is the spatially varying conductivity field and T{x) the temperature profile. Assume 
that known boundary conditions r(0) = and (~c(x)^)^ = q are imposed and temperature 
measurements T; (without any noise) are obtained at N distinct points Xi E [0, 1] with the intention 
of identifying the unknown conductivity and its spatial variation. 

For any interval Axi = Xi+i — Xi between two observation locations, the governing PDE and 
boundary conditions imply that: 

\Jx, c(x) J T,+i - 

Similar expressions hold for all other intervals and relate the effective conductivity in each subdo- 
main (given by the harmonic mean) with the measured temperature. These relations however do 
not uniquely identify the spatial variability of c{x) unless the latter is assumed constant within 
each Aa:^. Further constraints can be imposed by assuming continuity of c{x) at the .t^'s but these 
do not necessarily hold if one considers materials that consist of distinct phases. Even when such 
constraints seem plausible, one can readily imagine parametric forms of c{x) (i.e. polynomials of 
high degree) which cannot be completely identified unless further constraints (e.g. continuity of the 
derivatives of c{x) at Xj) are artificially imposed. The non-uniqueness persists when the number of 
measurements N increases even though the space of possible solutions shrinks. It also precludes 
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the possibility of detecting significant changes in c{x) that occur in length scales much smaller than 
Axi (e.g. flaws) which are generally of significance to the analyst. Their contribution to the effective 
conductivity in Equation Q can be negligible unless Axi is of comparable size. This ill-posedness 
has long been identified and can become more pronounced in two or three dimensional domains 
and if the governing PDEs are nonlinear or involve more than one unknown parameters or fields 
( ||38ll ). It is also amplified if the measurements obtained are contaminated by random noise which is 
generally the case in engineering practice. 

Hence there is a need for a general framework that can produce estimates about the unknown fields 
particularly with regards to the scale of their variability. This is especially important as the accuracy 
of the predictions of computational models is greatly influenced by the the multiscale nature of 
property variations. In recent years a lot of research efforts have been devoted to the development 
of scalable, black-box simulators that provide the coarse-scale solution while capturing the effect 
of fine-scale fluctuations ( 1121 ). The multiscale analysis of such systems inherently assumes that 
the complete, fine-scale variation of various properties (or model parameters in general) is known. 
This assumption limits the appUcability of these frameworks since it is usually impossible to directly 
determine the complete structure of the medium of interest at the finest scale. More often than not, 
what is experimentally available and accessible (as in the example above), are measurements of 
the response of these systems under prescribed input or excitation, at spatial scales much coarser 
than those of the property variations. In problems of estimation of soil permeability for example, 
measurements are restricted to a few bore holes several meters apart from each other. In estimating 
damage in an aircraft fuselage, measurements of the response (displacements, accelerations etc) are 
collected at a few locations. 

This limited and noisy information naturally introduces a lot of uncertainty and necessitates view- 
ing the property variation as a random field whose statistical properties must be consistent with the 
available data. To that end the present paper proposes a general framework that is based on the 
Bayesian paradigm and addresses the following questions: a) How can one utilize deterministic, 
forward solvers in order to identify spatial variability of various properties while accounting for the 
associated uncertainty? b) How can this process produce estimates at various resolutions?, c) As 
these forward models are computationally demanding, how can this process be done in a computa- 
tionally efficient manner?, d) How can the available data be used to quantify error or discrepancies 
in the forward models? 

In the following sections we discuss the characteristics of the proposed Bayesian model with partic- 
ular emphasis on the prior specifications and their physical implications. We then present a general, 
efficient inference technique for the determination of the posterior and discuss how predictions in the 
context of computational models can be achieved. We finally illustrate the capabilities in numerical 
examples. 



5 



3 Methodology 



3.1 Hierarchical Bayesian Model 

The central goal of this work is to build mathematical methods that utilize limited and noisy ob- 
servations/measurements in order to identify the spatial variability of model parameters. Given the 
significant uncertainty arising from the random noise, lack of data and model error, point estimates 
are of little use. Furthermore it is important to quantify the confidence in the estimates made but also 
in the predictive ability of the the model of interest. To that end we adopt a Bayesian perspective. 
Bayesian formulations differ from classical statistical approaches (frequentist) in that all unknown 
parameters (denoted by 6) are treated as random. Hence the results of the inference process are not 
point estimates but distribution functions. 

The basic elements of Bayesian models are the likelihood function L{6) = p{y \ 6) which is a con- 
ditional probability distribution and gives a (relative) measure of the propensity of observing data 
y for a given model configuration specified by the parameters 6. The likelihood function is also 
encountered in frequentist formulations where the unknown model parameters 6 are determined by 
maximizing L{6). This could be thought as the probabilistic equivalent of deterministic optimiza- 
tion techniques commonly used in inverse problems. It can suffer from the same issues related to 
the ill-posedeness of the problem. The second component of Bayesian formulations is the prior dis- 
tribution p{9) which encapsulates in a probabilistic manner any knowledge/information/insight that 
is available to the analyst prior to observing the data. Although the prior is a point of frequent crit- 
icism due to its inherently subjective nature, it can prove extremely useful in engineering contexts 
as it provides a mathematically consistent vehicle for injecting the analyst's insight and physical 
understanding. The combination of prior and likelihood based on Bayes' rule yields the posterior 
distribution tt{6) which probabilistically summarizes the information extracted from the data with 
regards to the unknown 6 : 

n{e)=p{e\y)^P^yl^^p^^piy\e)p{e) (4) 

p{y) 

Hence Bayesian formulations allow for the possibility of multiple solutions - in fact any 6 in the 
support of the likelihood and the prior is admissible - whose relative plausibility is quantified by 
the posterior Credible or confidence intervals can be readily estimated from the posterior which 
quantify inferential uncertainties about the unknowns. 

Without loss of generality, we postulate the existence of a deterministic, forward model which in 
most cases of practical interest corresponds to a Finite Element or Finite Difference model of the 
governing differential equations. Naturally, forward models allow for various levels of discretiza- 
tion of the spatial domain and let r denote the resolution they operate upon (larger r implies finer 
resolution). In this paper, forward solvers are viewed as messengers, that carry information about 
the underlying material properties as they manifest themselves in the response (mechanical, thermal 



6 



Bayesian Model 
Nonparametric prior 




* sensor locations 



black-box solver 
coarse resolution 



inferred field 



black-box solver 
medium resolution 











black-box solver 
fine resolution 










black-box solver 
finest resolution 








Figure 1: Hierarchy of solvers operating on different resolutions 
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etc) of the medium of interest. This is especially true in the context of recently developed upscaling 
schemes (Ol ES] [H gOl [HI |56l |63l SI) which attempt to capture the effect of finer scale mate- 
rial variability while operating on a coarser grid. In general, the finer the resolution of the forward 
solver, the more information this provides. This however comes at the expense of computational 
effort. It is not unusual that the sufficient resolution of the property fluctuations in many systems of 
practical interest requires several CPU-hours for a single analysis. Despite the fidelity and accuracy 
of such high-resolution solvers, they can be of little use in the context of parameter identification as 
they will generally have to be called upon several times and several system analyses will have to be 
performed. 

Hence an accurate but expensive messenger is not the optimal choice if several pieces of information 
need to be communicated. In many cases however, the fidelity of the message can be compromised 
if the expense associated with the messenger is smaller. This is especially true if the loss of accuracy 
can be quantified, measures of confidence can be provided and furthermore if it leads to the same 
decisions/predictions. In this project we propose a consistent framework for using faster but less- 
accurate forward solvers operating on coarser resolutions in order to expedite property identification. 
Furthermore these solvers provide a natural hierarchy of models that if appropriately coupled can 
further expedite the identification process. Following the analog introduced earlier, we propose 
using inexpensive messengers (coarse scale solvers), several times to communicate the most pivotal 
pieces of information and more expensive messengers (fine scale solvers) fewer times to pass on 
some of the finer details (Figure [T]!. 

In the remainder of this sub-section, we discuss the basic components of the Bayesian model pro- 
posed, with particular emphasis on the prior for the unknown parametric fields. We then present 
(sub-section l3.2b the proposed inference techniques for the determination of the posterior. 

3.1.1 Likelihood Specification 

Let F"^ — {F[} : Q ^ £ denote the vector- valued mapping implied by the forward model (operat- 
ing at resolution r), which given f{x) E Q (Equation ^) provides the values of response quantities 
represented by the data y = {yi} e £. This function is the discretized version of the inverse of 
the differential operator A in Equation ([T]i parameterized by f{x). Each evaluation of F"^ for a 
specific field f{x) implies a call to the forward solver (e.g. Finite Elements) that operates on a 
discretization/resolution r. In the proposed framework, the function F"^ will be treated as a black 
box. Naturally data and model predictions will deviate when the former are obtained experimentally 
due to the unavoidable noise in the measurements. Most importantly perhaps this deviation can be 
the result of the model not fully capturing the salient physics either because the governing PDEs are 
an idealization or because of the discretization error in their solution. We postulate the following 
relationship: 
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^ = Ft\f{x)) +e^^ z = l,2,...,n (5) 

datum i model prediction 

(r) 

where e,- quantify the deviation between model predictions and data, and which will naturally 
depend on the resolution r of the forward solver. Quite frequently the data available to us are in 
the form of disparate observations, that correspond to different physical phenomena (e.g. temper- 
atures and displacements in a thermo-mechanical problem) in which case the computational model 
corresponds to a coupled multiphysics solver. 

The probabilistic model for e[ in Equation (|5]l gives rise to the likelihood function (Equation (|4|i). In 

(r) 

the simplest case where are assumed independent, normal variates with zero mean and variance 



Pr{yt\ f{x),ar) OC — exp{-- 2 } 



and pr{y \ f{x),ar) oc 



^-p{-^E(..-^^^^(/(.)))^ (6) 

i—l 



More complex models which can account for the spatial dependence of the error variance cr^ or the 
detection of events associated with sensor malfunctions at certain locations, can readily be formu- 
lated. In general the variances cr^ are unknown (particularly the component that pertains to model 
error) and should be inferred from the data. When a conjugate, Gamma{a, b) prior is adopted for 
CT^^, the error variances can be integrated out from Equation (|6]l further simplifying the likelihood: 
Lrim) = p{y I fix)) (X na + n/2) ^_ 



where r(z) = °° e"* dt is the gamma function. 

It should be noted that in some works ( 1391 l32l ). explicit distinction between model and observation 
errors is made, postulating a relation of the following form: 

observation/ data = model prediction + model error + observation error (8) 



As it has been observed ( iTOl ). independently of the amount of data available to us, these three 
components are not identifiable, meaning several different values can be equally consistent with the 
data. This however does not imply that all possible values are equally plausible. For example a large 
number of values of the observation error that are all positive or all negative (for all observations) 
are not consistent with the perception of random noise but most likely imply a bias of the model 
or perhaps a miscalibrated sensors used to collect the data. Bayesian formulations are highly suited 
for such problems as they provide a natural way of quantifying a priori and a posteriori relative 
measures of plausibility. In the following we restrict the presentation on models of Equation (|5]l as 
the focus of is on identifying the scale of variability of material properties f{x). 
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3.1.2 Prior Specification 

The most critical component involves the prior specification for the unknown material properties as 
represented by f{x). In existing Bayesian ( ll67l [37l ). but also deterministic (optimization-based), 
formulations, f{x) is discretized according to the spatial resolution of the forward solver. For ex- 
ample, in cases where finite elements are used, the property of interest is assumed constant within 
each element and therefore the vector of unknowns is of dimension equal to the number of elements. 
This offers obvious implementation advantages but also poses some difficulties since the scale of 
variability of material properties is implicitly selected by the solver rather than the data. This is 
problematic in several ways. On one hand if the scale of variability is larger than the grid, a waste 
of resources takes place, at the solver level which has to be run at unnecessarily fine resolutions, and 
at the level of the inference process which is impeded by the unnecessarily large dimension of the 
vector of unknowns. Furthermore, as the number of unknowns is much larger by comparison to the 
amount of data it can lead to over-fitting. This will produce erroneous or even absurd values for the 
unknowns that may nevertheless fit perfectly the data. Such solutions will have negligible predictive 
ability and would be useless in decision making. On the other hand, if the scale of variability is 
smaller than the grid, it cannot be identified even if the solver provides sufficient information for 
discovering this possibility. 

In order to increase the flexibiUty of the model, we base our prior models for the unknown field(s) 
,f{x) on the convolution representation of a Gaussian process. An alternative representation of a 
stationary Gaussian process involves a convolution of a white noise process a{x) with a smoothing 
kernel K{.;(j)) depending on a set of parameters </) (|[3l l29l ): 



fix) = J K{x-z-^)a{z)dz (9) 
The kernel form determines essentially the covariance of the resulting process, since: 

cov {f{xij{x2)) = E[f{xi, f{x2)] = j K{xi - z;(j))K{xi - z; (j)) dz 
For computational purposes, a discretized version of Equation (|9]l is used: 



(10) 



k k 

f{x) = ^a{zj)K{x ~ Zj\(l)) ^^ajK{x- Xj;(f>) (11) 

In order to increase the expressive ability of the aforementioned model we introduce two improve- 
ments. Firstly we consider that the set of kernel parameters (j) is spatially varying resulting in a 

non-stationary process: fc 

f{x) = aa + Y^ ajKj{x\ 4>j) xe D (12) 

where oq corresponds to a value of (f>o such that the corresponding kernel is 1 everywhere. Such 
representations can be viewed as a radial basis network as in 1611 ). Furthermore by interpreting 
the kernels as basis functions. Equation (fT2l i it can be seen as an extension of the the representer 
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theorem of Kimeldorf and Wahba ( [,411 ). Overcomplete representations as in Equation (fT2] i have 
been advocated because they have greater robustness in the presence of noise, can be sparser, and 
can have greater flexibility in matching structure in the data ( B6ll47l ). One possible selection for 
the functional form of Kj , that also has an intuitive parameterization with regards to the scale of of 
variabiUty in the material properties, is isotropic, Gaussian kernels: 

K{x;cl)j = {xj,Tj)) = exp{-Tj- || x-xj |p} (13) 

The parameters Tj directly correspond to the scale of variability of f{x). Large Tj 's imply narrowly 
concentrated fluctuations and large values slower varying fields. The center of each kernel is speci- 
fied by the location parameter Xj. Other functional forms (e.g. discontinuous) can also be used on 
their own or in combinations to enrich the expressivity of the expansion in Equation (fT2b . Wavelets, 
steerable wavelets, segmented wavelets, Gabor dictionaries, multiscale Gabor dictionaries, wavelet 
packets, cosine packets, chirplets, warplets, and a wide range of other dictionaries that have been 
developed in various contexts (||6l) offer several possibilities. 

The second important improvement is that we allow the size of the expansion k to vary. It is obvious 
that such an assumption is consistent with the principle of parsimony, which states that prior models 
should make as few assumptions as possible and allow their complexity to be inferred from the data. 
Hence the cardinality of the model, i.e. the number of basis functions k is the key unknown that 
must be determined so as to provide a good interpretation of the observables. 

Independently of the form of the kernel adopted, the important, common characteristic of all such 
approximations (as in Equation (fT2] i) is that the field representation does not depend on the resolution 
of the forward model. The latter affects inference only through the black-box functions (Equation 
(|5]l, Figure[T]i) as it will be illustrated in the next sections. 

The parameters of the prior model adopted consist of: 

• k: the number of kernel functions needed, 

• {cLj}^^^, the coefficients of the expansion in Equation ( fT2b . Each of those can be a scalar 
or vector depending on the number of material property fields we want to infer simultane- 
ously. For example, in a problem of thermo-mechanical coupling where the data consists of 
temperatures and displacements and we want to identify elastic modulus and conductivity, 
each flj will be a vector in M? . 

• {t? the precision parameters of each kernel which pertain to the scale of the unknown 
field(s), and 

• {ajj}*^^]^ the locations of the kernels which are points in T). 

In accordance with the Bayesian paradigm, all unknowns are considered random and are assigned 
prior distributions which quantify any information, knowledge, physical insight, mathematical con- 
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stxaints that is available to the analyst before the data is processed. Naturally, if specific prior infor- 
mation is available it can be reflected on the prior distributions. We consider prior distributions of 
the following form (excluding hyperparameters): 

p(fc,{aj}jto>{^j}j=i,{a;j}j=i) oc p{k) 

X P({aj}j=o I k) 
X p{{T,}U\k) 

In order to increase the robustness of the model and exploit structural dependence we adopt a hier- 
archical prior model ( ||24|| ). 

Model Size: 

Pivotal to the robustness and expressivity of the model is the selection of the model size, i.e. of the 
number of kernel functions k in Equation ( fT2] i. This number is unknown a priori and in the absence 
of specific information, sparse representations should be favored. This is not only advantageous 
for computational purposes, as the number of unknown parameters is proportional to fc, but also 
consistent with the parsimony of explanation principle or Occam's razor ( 1361 l54l l52l ). For that 
purpose, we propose a truncated Poisson prior for k: 

I otherwise 

The truncation parameter k„iax is selected based on computer memory limitations and defines the 
support of the prior This prior allows for representations of various cardinalities to be assessed 
simultaneously with respect to the data. As a result the number of unknowns is not fixed and the 
corresponding posterior has support on spaces of different dimensions as discussed in more detail in 
the sequence. In this work, an exponential hyper-prior is used for the hyper-parameter A to allow for 
greater flexibility and robustness i.e. p{\ \ s) = s cxp{— A s}. After integrating out A we obtain: 

P{k I s) cx , for fc = 0, 1, . . . , (16) 

Scale: 

The most critical perhaps parameters of the model are {tj}*^^]^ which control the scale of variability 
in the approximation of the unknown field(s). If prior information about this is available then it can 
be readily accounted for by appropriate prior specification. In the absence of such information how- 
ever multiple possibiUties exist. In contrast to deterministic optimization techniques where ad-hoc 
regularization assumptions are made, in the Bayesian framework proposed possible solutions are 
evaluated with respect to their plausibility as quantified by the posterior distribution. This provides 
a unified interpretation of various assumptions that are made regarding the priors of the parameters 
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involved. For example, consider a general Gamma{ar , br) prior: 

p({r,}^ti I k, a,, K) = n eM-brT,) (17) 

This has a mean ar/hr and coefficient of variation \/ y/a^. Diffuse versions can be adopted by 
selecting small a^-. A non-informative prior p(tj ) oc l/r^ arises as a special case for 0^ — 2 and 
6r = which is invariant under rescaling. Furthermore, it offers an interesting physical inter- 
pretation as it favors "slower" varying representations (i.e. smaller r's). In order to automatically 
determine the mean of the Gamma prior, we express hr ~ fija^ where fij is a location parameter 
for which an Exponential hyper-prior is used with a hyper-parameter i.e. p{fJ,j) = ■^e~^i/°''^ . 
Integrating out the /i^ 's leads to following prior: 



p{{r,}U I ar,a,) = J] ^^^^ <^ 



1 



(18) 



Other Parameters: 

For the coefficients aj a multivariate normal prior was adopted: 

{a,}';^o\k,'jl^N{0,allk+i) (19) 

where Ik+i is the (fc + 1) x (fc + 1) identity matrix. The hyper-parameter which controls the 
spread of the prior is modeled by the standard inverse gamma distribution Ijiv — Gamma{ao, bo). 
It can readily be integrated-out leading to the following prior for a^ 's: 



Pi{aj}-=o I fc, ao, bo) = 7^;^T(^TT)71 7 ^ .ao+(fc+i)/2 (^0) 



Finally, for the unknown kernel locations {xj}j^i, a uniform prior in V is proposed i.e.: 

Pii^AU I - ^ (21) 



where | I? | is the length or area or volume of V in one, two or three dimensions respectively. 
Naturally if prior information is available about subregions with significant property variations this 
can be incorporated in the prior. 

Complete Model: 

Let dk = {{fljlj^o' {'^i}J=i, {^j}j=i} ®k denote the vector containing aU the unknown param- 
eters and 6 = (fc, 9k)- Since fc is also assumed unknown and allowed to vary, the dimension of 6k is 
variable as well and &k = (M''+^ x (R+)'^ x T)''. In 2D for example and assuming a scalar unknown 
field f{x) in the expansion of Equation (fT2b the dimension of 0^ is (fc + 1) + fc + 2fc = 2 + 4fc. 
Based on Equation (fT4l i and Equations ( fT6] l, (fTTT i, ( |20l i and (ISTT i. the complete prior model is given 
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by: 

p{e I s, ar.a^j,, ao,bo) = 



1 



X 



X 



-pj- r(a^ + 1) I 1 

1 r(ao + ^) 

(2^)(fc+l)/2 . .ao + (fc+l)/2 

1 



(22) 



The combination of the prior p{9) with the UkeUhood Lr{9) (Equation (Q) corresponding to a for- 
ward solver operating on resolution r, give rise to the posterior density tt^ (9) which is proportional 
to: 

TTrW = pr.i9 \ y) <^ Lr.{9) p{9) (23) 

Even though several parameters have been removed from the vector of unknowns 9 and marginalized 
in the pertinent expressions, the corresponding posteriors can be readily be obtained, or rather be 
sampled from, once the posteriors 7r,.(0) have been determined. As it is shown in the numerical 
examples, of interest could be the variance ct^ of the error term (Equations (|5]l, Q) which quantifies 
the magnitude of the deviation between model and data and can serve as a validation metric (in the 
absence of observation error) or be used for predictive purposes (see section [373] ). From Equation Q 
and the conjugate prior model adopted for a^, it can readily be shown that the conditional posterior 
is given by a Gamma distribution: 

p{a-^,9\y) = p{a-^ \ 9) 7rr{9 \ y) 



and 

p(cr,7^ I 9) = Gamma 



o,&+ —\ (24) 



v 



2' 



In the context of Monte Carlo simulation, this trivially implies that once samples 9 from tt,. have 
been obtained, the samples of cr^-^ can also be drawn from the aforementioned Gamma. 

The support of the posteriors tt^ lies on u'I'^q" {k} x ©j.. Two important points are worth empha- 
sizing. Firstly, Equation ( |23] ) defines a sequence of posterior densities, each corresponding to a 
different likelihood and a different forward solver of resolution r. It is clear that the black-box func- 
tions F'^^'' appearing in the likelihood in Equation ^ imply denser mappings for smaller r. This 
is because solvers corresponding to coarser resolutions of the governing PDEs are more myopic 
(compared to solvers at finer resolutions) to small scale fluctuations of the spatially varying model 
parameters f{x) (parameterized by 9). As a result the likelihood functions Lr and the associated 
posteriors tt^ will be flatter and have fewer modes for smaller r. The task of identifying these poste- 
riors becomes increasingly more difficult as we move to solvers of higher refinement (i.e. larger r). 
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It is this feature that we propose of exploiting in the next section in order to increase the accuracy 
and improve on the efficiency of the inference process. In addition, the posteriors tt^ are only known 
up to a normalizing constant (determining p{y) in Equation (|4]i involves an infeasible and unneces- 
sary integration in a very high dimensional space). Each evaluation of tt^ for a particular 6 requires 
calculating F'-^^ and therefore calling the corresponding black-box solver As each of these runs of 
the forward solver may involve the solution of very large systems of equations they can be extremely 
time consuming. It is important therefore to determine tt^ not only accurately, but also with the least 
possible number of calls to the forward solver. Since solvers corresponding to coarser resolutions 
(smaller r) are faster, it would be desirable to utilize the information they provide in order to reduce 
the number of calls to more expensive, finer resolution solvers. 

3.2 Determining the Posterior - Inference 

The posterior defined above is analytically intractable. For that reason, Monte Carlo methods pro- 
vide essentially the only accurate way to infer tt^. Traditionally Markov Chain Monte Carlo tech- 
niques (MCMC) have been employed to carry out this task (||30]|45]|44l|66l|22l). These are based 
on building a Markov chain that asymptotically converges to the target density (in this case tt^) by 
appropriately defining a transition kernel. While convergence can be assured under weak conditions 
( P9l |55l ). the rate of convergence can be extremely slow and require a lot of likelihood evaluations 
and calls to the black-box solver. Particularly in cases where the target posterior can have multiple 
modes, very large mixing times might be required which constitute the method impractical or in- 
feasible. In addition, MCMC is not directly parallelizable, unless multiple independent chains are 
run simultaneously and it can be difficult to design a good proposal distribution when operating in 
high dimensional spaces. More importantly perhaps, standard MCMC is not capable of providing a 
hierarchical, multi-resolution solution to the problem. Consider for example, the case that several 
samples have been drawn using MCMC from the posterior tt^j corresponding to a solver operating 
on resolution r = ri . If samples of the posterior tt^^ are needed, corresponding to a solver of finer 
resolution r2 > ri but not significantly different from ri, then MCMC iterations would have to be 
initiated anew. Hence there is no immediate way to exploit the inferences made about tt^i even 
though the latter might be quite similar to 11^2 ■ 

In this work we advocate the use of Sequential Monte Carlo techniques (SMC). They represent a 
set of flexible simulation-based methods for sampling from a sequence of probability distributions 
( II5T] [161 ). As with Markov Chain Monte Carlo methods (MCMC), the target distribution(s) need 
only be known up to a constant and therefore do not require calculation of the intractable integral 
in the denominator in Equation (|4|i. They utilize a set of random samples (commonly referred to 
as particles), which are propagated using a combination of importance sampling, resampling and 
MCMC-based rejuvenation mechanisms ( IfTTlfTOl ). Each of these particles, which can be thought 
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of as a possible configuration of the system's state, is associated with an importance weight which 
is proportional to the the posterior value of the respective particle. These weights are updated se- 
quentially along with the particle locations. Hence if {6^^'^ , luf*'' }iLi represent N such particles and 
associated weights for distribution 7r,.(0) then: 

N 

7r,(0)«^ W^«5g(o(0) (25) 

i=l 

where wi*^ = wi'^/J^Z 

_^ u)r*^ are the normalized weights and Sg(i) (.) is the Dirac function cen- 
tered at 0^^\ Furthermore, for any function h{6) which is Tr^-integrable (ll9]|2l): 

N 

Wj:''> h{ei'^) ^ h{e) TTr{9) de almost surely (26) 
Before discussing the SMC sampler proposed, it is worth recapitulating the basic desiderata: 

a) Accuracy: the Monte Carlo scheme should be able to correctly sample from multi-modal 
distributions 

b) Hierarchical, Multiscale: the Monte Carlo scheme should be able to exploit inferences 
made using forward solvers corresponding to coarser resolutions and refine them as more 
elaborate forward solvers are used. 

c) Efficiency: the Monte Carlo sampler should require the fewest possible calls to the forward 
solver. It should be directly parallelizable and utilize inferences made using cheaper for- 
ward solvers corresponding to coarser resolutions in order to reduce the number of calls to 
more expensive forward solvers corresponding to finer resolutions. 

The goal is to obtain samples from each of the posterior distributions in Equation (l23T l correspond- 
ing to solvers with increasingly finer spatial resolution of the governing PDEs, r = ri, r2, . . . , tm 
where ri is the coarsest to r^f the finest. For economy of notation we define the artificial posterior 
TTr„ (9) = p{9) that coincides with the prior (which is common to all resolutions and independent of 
the forward solver). To demonstrate the proposed process it suffices to consider a pair of these pos- 
terior densities tti{9) cx Li{6) p{6) and tt2{9) oc L2{9) p{6) corresponding to forward solvers at 
two successive resolutions r^^ and r^^ (Figure[2]) and discuss the inferential transitions. Let TTi2^-y{6) 
denote a sequence of artificial, auxiliary distributions defined as follows: 

^12.7(0) = 7T[''''\e) nliO) = l['-''\9) me) p{9) 7 e [0, 1] (27) 

where 7 plays the role of reciprocal temperature. Trivially for 7 = we recover tti and for 7 = 1, 
712- The role of these auxiliary distributions is to bridge the gap between tti and tt2 and provide a 
smooth transition path where importance sampling can be efficiently applied. In this process, in- 
ferences from the coarser scale solver are transferred and updated to conform with the finer scale 
solver Starting with a particulate approximation for TTr„ (6) ~ p{9) (which trivially involves draw- 

(i) 

ing samples from the prior with weights Wq = 1), the goal is to gradually update the importance 
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Figure 2: Illustration of bridging densities as defined in Equation dZTI i between posterior distribu- 
tions TTi{9), n2{9) corresponding to different resolutions of the governing PDEs. These allow for 
accurate and computationally efficient transmission of the inferences made to finer scales. 



weights and particle locations in order to approximate the target posteriors at various resolutions. In 
order to implement computationally such a transition we define an increasing sequence of {7s}f=i 
with 7o = and 7s = 1 (see sub-section 13.2. 11 1. An SMC-based inference scheme would then 
proceed as described in Table [T] 

SMC algorithm: 

1. For s ~ 0, let {0q\ WQ^'y^i be the initial particulate approximation to 
7ri2,7o = ''^1- Set s = 1. 

2. Reweigh: Update weights w^f' = ui^'^i ^^"7/)'' 

3. Rejuvenate: Use an MCMC kernel -Ps(.,.) that leaves 7ri2,7^ invariant to 
perturb each particle d^^l-^ ^ 0^'^ 

4. Resample: Evaluate the Effective Sample Size, ESS = 1/ E^i(W^i+i)^ 
and resample the population if it is less than a prescribed threshold ESSmm- 

5. The current population {0^'-', Ws*''}fLi provides a particulate approximation 
of 7ri2.'y^ in the sense of Equations (l25l l, (|26] |. 

6. If s < S" (and 7s < 1) then set s = s + 1 and goto to step 2. Otherwise stop. 



Table 1 : Basic steps of an SMC algorithm 



17 



The role of the Reweighing step is to correct for the discrepancy between the two successive 
distributions in exactly the same manner that importance sampling is employed. The Re- 
sampling step aims at reducing the variance of the particulate approximation by eliminating 
particles with small weights and multiplying the ones with larger weights. The metric that 
we use in carrying out this task is the Effective Sample Size (ESS, Table[TJ which provides 
a measure of degeneracy in the population of particles as quantified by their variance. If 
this degeneracy exceeds a specified threshold, resampling is performed. As it has been 
pointed out in several studies ( IfTSl ). frequent resampling can deplete the population of its 
informational content and result in particulate approximations that consist of even a single 
particle. Throughout this work ESSmin = N/2 was used. Although other options are 
available, multinomial resamphng is most often applied and was found sufficient in the 
problems examined. 

A critical component involves the perturbation of the population of samples by a standard 
MCMC kernel in the Rejuvenation step as this determines how fast the transition takes 
place. Although there is freedom in selecting the transition kernel (., .) (the only require- 
ment is that it is 7112,7^ -invariant), there is a distinguishing feature that will be elaborated 
further in the next sub-section (see l3.2.2"] i. The target posteriors tt^ (as well as the interme- 
diate bridging distributions in Equation dZTl i) live in spaces of varying dimensions as pre- 
viously discussed. Hence an exploration of the state space must involve trans-dimensional 
proposals. Pairs of such moves can be defined in the context of Reversible -Jump MCMC 
(RJMCMC , E6\ ) such as adding/deleting a kernel in the expansion of Equation ( fT2] i. or 
splitting/merging kernels (see l3.2.2] i. Even though it is straightforward to satisfy the invari- 
ance constraint in the RJMCMC framework, it is more difficult to design moves that also 
mix fast. As each (RJ)MCMC requires a likelihood evaluation and a call to a potentially 
expensive forward solver, it is desirable to minimize their number while retaining good 
convergence properties. 

In most implementations of such SMC schemes, the sequence of intermediate, bridging dis- 
tributions is fixed a priori. In order to ensure a smooth transition, a large number is selected 
at very closely spaced 7^. It is easily understood that for reasons of computational effi- 
ciency, it is desirable to minimize the number of intermediate bridging distributions while 
ensuring that the successive distributions are not significantly different. In sub-section 
( 13.2. It we discuss a novel adaptive scheme that allow the automatic determination of these 
distributions resulting in significant computational savings. 
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• It should be noted that the framework proposed is directly parallelizable, as the evolution 
(reweighing, rejuvenation) of each particle is independent of the rest. Hence the computa- 
tional effort can be readily distributed to several processors. 

• The particulate approximations obtained at each step, provide a concise summary of the 
posterior distribution based on the respective forward solver This can be readily updated 
in the manner explained above, if forward solvers at finer resolutions become available 
or computationally feasible. Similar bridging distributions can be established between 
distinct forward solvers with differences going beyond their respective resolutions. This is 
made possible by the nonparametric Bayesian model which is independent of the forward 
solver and the flexible inference engine based on SMC. 

• An advantageous feature of the proposed framework is that the confidence in the estimates 
made can be readily quantified by establishing posterior (or credible) intervals, i.e. the 
posterior probability that the unknown field of interest f{x) exceeds or not a specified 
threshold, from the particulate approximations (Equation (|25])). It is these credible intervals 
(or in general measures of the variability in the estimates such as the posterior variance) that 
can guide adaptive refinement of the governing PDEs. Traditionally, adaptive refinement 
has been based on estimates of some error norm in the solution of the governing PDEs (lH). 
This however is inefficient and inadequate for the purposes of identifying spatially varying 
model parameters as solution errors are not necessarily correlated with the confidence in 
the estimates. It is envisioned that the posterior variance at each point cc e I? in the domain 
interest can serve as the basis for increasing the resolution of the solver at select regions 
and making optimal use of the computational resources available. 

3.2.1 Bridging distributions 7ri2,7^ 

The role of these auxiliary distributions is to facilitate the transition between two different posteriors 
TTi and 7r2 corresponding to two distinct solvers. It is easily understood that if tti and tt2 are not 
significantly different, then fewer bridging distributions will be needed and vice versa. As it is 
impossible to know a priori how pronounced these differences are, in most implementations a rather 
large number of bridging distributions is adopted, erring on the side of safety. We propose an 
adaptive SMC algorithm, that extends existing versions ( ifTOlfTTl ) in that it automatically determines 
the number of intermediate bridging distributions needed. In this process we are guided by the 
Effective Sample Size (ESS, Table[T]) which provides a measure of degeneracy in the population of 
particles. If ESSs is the ESS of the population after the step s and in the most favorable scenario 
that the next bridging distribution 7ri2,7^_^i is very similar to ir^.^,, ESSg+i should not be that much 
different from ESSg. On the other hand if that difference is pronounced then ESSg+i could drop 
dramatically. Hence in order to determine the next auxiliary distribution, we define an acceptable 
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reduction in the ESS, i.e. ESSs+i > C ESSg (where C < 1) and prescribe 7,5+1 (Equation ( |27] |) 
accordingly. The revised Adaptive SMC algorithm is summarized in Table |2l 

Adaptive SMC algorithm: 

1. For 3 = 0, let {9q\ WQ^^yfL^ be the initial particulate approximation to 
"'12.70 = ""i and ESSq the associated effective sample size. Set s = 1. 

2. Reweigh: If w^s\ls) = w^^li ^^"7^)'' are the updated weights as a 
function of 7s then determine 7s so that the associated ESSg = (ESSg-i 

(i) 

(the value ^ = 0.95 was used in all the examples). Calculate Ws for this 7s. 

3. Resample: IfESSg < ESSmm then resample. 

4. Rejuvenate: Use an MCMC kernel Ps(-,-) that leaves 1112,-1^ invariant to 
perturb each particle ©s-i ^ ^,1*' 

5. The current population {©s*^ , w'f^ provides a particulate approximation 
of 7ri2.'y^, in the sense of Equations (l25T l, (|26] |. 

6. If 7s < 1 then set s = s + 1 and goto to step 2. Otherwise stop. 

Table 2: Basic steps of the Adaptive SMC algorithm proposed 
3.2.2 Trans-dimensional MCMC 

As mentioned earlier, a critical component in the SMC framework proposed is the MCMC-based 
rejuvenation step of the particles 0. It should be noted that the kernel Ps(-, •) in the rejuvenation 
step (Step 3 of the SMC algorithm) need not be known explicitly as it does not enter in any of 
the pertinent equations. It is suffices that it is 7ri2,-y^-invariant which is the target density. For the 
efficient exploration of the state space, we employ a mixture of moves which involve fixed dimension 
proposals (i.e. proposals for which the cardinality of the representation k is unchanged) as well as 
moves which alter the dimension k of the vector of parameters 0. We consider a total of M = 7 
such moves, each selected with a certain probability as discussed below. Of those, four involve 
trans-dimensional proposals which warrant a more detailed discussion. 

It is generally difficult to design proposals that alter the dimension significantly while ensuring a 
reasonable acceptance ratio. For that purpose, in this work we consider proposals that alter the 
cardinality k of the expansion by 1 i.e. k' = k — 1 or: k' = k + \ . We adopt the the Reversible-Jump 
MCMC (RJMCMC) framework introduced in ||26ll according to which such moves are defined in 
pairs in order to ensure reversibiUty of the Markov kernel (even though the reversibility condition is 
not necessary, it greatly facilitates the formulations). We consider two such pairs of moves, namely 
birth-death and split-merge. Let a proposal from (/c, 0) to (fc', 0') that increases the dimension i.e. 
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A:' = fc + 1 and e &k, 0' e 0''+^ (see last paragraph of sub-section 13. 1.2b . Let p{k —f k') 
the probability that such a proposal is made (user specified) and p{k' k) the probability that the 
reverse, dimension-decreasing proposal is made. In order to account for the m = dim{&k+i) — 
dim{@k) difference in the dimensions of and 0' , the former is augmented with a vector u E M™ 
drawn from a distribution q{u). Consider a differential and one-to-one mapping h : &k+i ©fe+i 
that connects the three vectors as 9' = h{9, u). Then as it is shown in 1261 . the acceptance ratio of 
such a proposal is: 



. ^,2,-^A0')p{k ^ k') 1 

mm < 1 , — ■r~r~/ ^ — 7 — r 

7ri2.7,(t')p(fc' k) q{u) 



86' 



(28) 



do' 



8(9, u) 

is the Jacobian of the mapping h. Such a proposal is invariant w.r.t. the density 



where 

7i'i2,7a ■ Similarly one can define, the acceptance ratio of the reverse, dimension-decreasing move: 

-1 ~> 



mm 



7:,2,yA0)pik' ^ k) 



89' 



8{9,u) 

In the following we provide details for the reversible pairs used in this work. 



(29) 



Birth-Death: In order to simplify the resulting expressions, we assign the following probabilities 
of proposing one of these moves purth = c min{l, ^^^^^p-} = c (from Equation (fTSI l) and 
Pdeath ~ c mm{l, ^^^^^j^} = c (from Equation (fTSIl). The constant c is user-specified (it is taken 
equal to 0.2 in this work). Obviously if fc = kmax, PUrth = and if fc = 0, pdeath = 0. 

For the death move: 

• A kernel j < j < k ) is selected uniformly and removed from the representation in 
Equation ( fT2b . 

• The corresponding aj is also removed. 
For the birth move: 

• A new kernel + 1 is added to the expansion while the existing terms remain unaltered. 

• The associated amplitude Uk+i is drawn from Af{0, <j1) (the variance a1 is equal to the 
average of the squared amplitudes aj over all the particles at the previous iteration) 

• The associated scale parameter Tk+i is drawn from the prior. Equation ( fTSb 

• The associated kernel location ccfe^i is also drawn from the prior. Equation ( 1211 1. 

Hence the vector of dimension-matching parameters u consists of m = (a^+i, Tfc+i, ajfc+i) and the 
corresponding proposal q{u) is: 

= ^^e-^ ^U^'^l exp(-5...,0 ^ (30) 

It is obvious that the Jacobian of such a transformation is 1. 
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Split-Merge These moves correspond to splitting an existing kernel into two or merging two existing 
kernels into one. Similarly to the birth-death pair, they alter the dimension of the expansion by 1 
and are selected with probabilities Psput ~ 7^ and Pmerge — c. For obvious reasons, Psput ~ 
if fc = k„iax and pmerge ~ if fc < 1. Consider first the merge move between two kernels ji 
and j2- In order to ensure a reasonable acceptance ratio, merge moves are only permitted when the 
(normalized) distance between the kernels is relatively small and when the amplitudes , aj^ are 
relatively similar. Specifically we require that the following two conditions are met; 

" ^'"^^ <5x \aj^- a.j^\<5a (31) 



T- + T- 

31 ^ 32 



(the values 6x = (5a = 1 were used in this work). Two candidate kernels are selected uniformly 
from the pool of pairs satisfying the aforementioned conditions. The proposed kernels ji and j2 are 
removed from the expansion and are substituted by a new kernel j with the following associated 
parameters: 



-j^y (32) 



.,=V^(^ + ^) (33) 



V -Ji V ^2 

This ensures that the average value of the previous expansion (with ji and j2) in Equation 
([12] ) when integrated in is the same with the new (which contains j in place of ji and 

h) 



(34) 



2 

The split move is applied to a kernel j (selected uniformly) which is substituted by two new kernels 
ji, j2- In order to ensure reversibility, kernels ji and j2 should satisfy the requirements of Equation 
(|3TT i and the application of a merge move in the manner described above, should return to the original 
kernel j. There are several ways to achieve this, corresponding essentially to different vectors u and 
mappings h in Equation ( |28] l. In this work: 

• A scalar Ur is drawn from the uniform distribution [/[0, 1] and r^^^ ~ "^^j^ ^j2^ ~ 
(1 — Ur)T~^. This ensures compatibility with Equation ( l32b . 

• A vector u^. is drawn uniformly in the ball of radius R where R ~ The center of the 
new kernels are specified and This ensures compatibility 
with the first of Equation OTl i as well as Equation ( l34b . 

• A scalar Ua is drawn from the uniform distribution J7[— ^ , ^] . The amplitudes of the new 
kernels are determined by a,, = a — Ua and a,, = a + Ua, where d = °+"°(v^~''/i~"t) .^ 
This ensures compatibility with the second of Equation ( [3T1 ) as well as Equation ( [33T l. 
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(a) Birth-Death moves 

Figure 3: Trans-dimensional RJMCMC proposals 



Jl ~] -^32 

(b) Split-Merge moves 



The vector of dimension-matching parameters u (in Equation (|28] |) consists of m = {ut, Ux,Ua) and 
the corresponding proposal q{u) is a product of uniforms in the domains specified above. After some 
algebra, it can be shown that the Jacobian of such a transformation is 2**+^- 



r+\/T^ 



The remaining three proposals, involve fixed-dimension moves that do not change the cardinality of 
the expansion but rather perturb some of the terms involved. In particular, we considered updates 
of the amplitude Uj, scale tj or location Xj of a kernel j selected uniformly (naturally, in the case 
of the amplitudes, the constant gq (Equation ( fT2] i) is also a candidate for updating). Each of these 
three moves is proposed with probability ^{pbtrth + Pdeath + PspHt + Pmerge) = ^(i^T + 
particular: 

1 . Update aj — > a'^: A coefficient aj (in Equation ( fT2] i) is uniformly selected and perturbed 



as: 



i' =aj+aiZ ,Z ^ 7V(0, 1) 



(35) 



2. Update Tj r': A scale parameter Tj (in Equation (fT2l i) is uniformly selected and per- 



turbed as: 



(36) 



(this ensures positivity of rj) 

3. Update Xj x'j: A location Xj ^ V C M'' (in Equation (fT2T i) is uniformly selected and 
perturbed as: 

x',=x,+a3Z, Z={Zu...,Zd), Z,^N{0,1) (37) 

The acceptance ratios are calculated based on the standard MCMC formulas using 7ri2,7, as the 
target density. It should be noted that the variances in the random walk proposals are adaptively 
selected so that the respective acceptance rates are in the range 0.2 — 0.4. As it is well-known 
(chapter 7.6.3 in ||55l ) adaptive adjustments of Markov Chains based on past samples can breakdown 
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ergodic properties and lead to convergence issues in standard MCMC contexts. In the proposed SMC 
framework however, such restrictions do not apply as it suffices that the MCMC kernel is invariant. 
This is an additional advantage of the proposed simulation scheme in comparison to traditional 
MCMC. 



3.3 Prediction 

The significance of mathematical models for the computational simulation of physical processes 
lies in their predictive ability. It is these predictions that serve as the basis for engineering decisions 
in several systems of technological interest. The proposed framework provides a seamless link 
from experiments/data collection, to model validation and ultimately prediction. In the presence 
of significant sources of uncertainty it is important not only to provide predictive estimates but 
quantify the level of confidence one can assign to the predicted outcome. The inferred posteriors 
corresponding to various model resolutions can be used to carry out this task. In accordance with 
the Bayesian mind-set, all unknowns are considered random. If y denotes the output to be predicted 
(under specified input, boundary & initial conditions) then, the predictive posterior p{y \ y) based 
on the available data y can be expressed as ( Il24l ): 

p{y\y) = j p{y,0\y)de = J pr{y\e,y)p{e\y)de (38) 

posterior 

N 

= J LriyJi9)M0) de^Yl ^r'^ ^r(y I ^^'^) 

likelihood 

The term p{y \ 6) is the likelihood of the predicted data determined by the forward solver at reso- 
lution r as in Equation O. Equation ( [38] l offers an intuitive interpretation of the predictive process. 
The predictive posterior distribution is a mixture of the corresponding likelihoods evaluated at all 
possible states 6 of the system , with weights proportional to the their posterior values. In the context 
of Monte Carlo simulations, samples of y from p{y \ y) can be readily drawn using the particulate 
approximation of each tt^ (Equation (IZSTi). These samples can subsequently be used to statistics of 
the predicted output y such as moments, probabilities of exceedance which can be extremely useful 
in engineering practice. 



4 Numerical Examples 

The method proposed is illustrated in problems from nonlinear soUd mechanics using artificial data. 
The governing PDEs are those of small-strain, rate-independent, perfect plasticity with a von-Mises 
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yield criterion and associative flow rule ( l58l ): 



V • (t{x) = 



(conservation of linear momentum) 



(T = 



C(£;,zy) : (e-e") 



(elastic stress-strain relationships) 



h{cT) : 




(tr[<T])2 




(yield surface) 



(flow rule) 



(39) 



where cr is the Cauchy stress-tensor, e = ^(Vii + uV) and the total and plastic-part of the strain 



tensor, v = {vx,Vy, v^) is the displacement vector, C{E, v) is the elastic moduli which depends on 
the Young's modulus E (it was assumed that it was known = 1, 000) and Poisson's ratio v (it was 
assumed that it was known v = 0.3). The field of interest in all the problems examined was the yield 
stress (Tyieidix) which was assumed to vary spatially. The yield stress determines the boundary of 
the elastic domain in the material response. A square two-dimensional domain T> — [0, 1] x [0, 1] 
under plane stress conditions was considered and the forward solvers were Finite Element models 
which discretize the governing PDEs of Equation ( [39] l for x E T). In order to construct a sequence 
of solvers operating at different resolutions, we considered 4 different partitions corresponding to 
uniform 8 x 8, 16 x 16, 32 x 32 and 64 x 64 grids (i.e. with element sizes |x|. t^xt^-^X]^ 
and ^ X ^ respectively). A critical issue with spatially varying parameters is how this variability 
is accounted in the discretized representation. In this work, we adopted a simple rule according 
to which each finite element was assigned a constant yield stress value which was equal to the 
average of the field (Jyieid{x) within the element. This scheme by no means represents a consistent 
upscaling of the governing PDEs let alone being optimal. It can be easily established that it can 
introduce significant deviations in the effective response which depends on the full details of the 
spatially varying field. This poor selection is made however to emphasize the point that inaccurate 
solvers can be useful and can lead to significant improvements in accuracy and efficiency. Their 
role is to provide a computationally inexpensive approximation to the fine-scale posterior that can 
be efficiently updated and refined using a reduced number of runs from more expensive solvers. 
Naturally, if more sophisticated upscaling schemes are introduced, the transitions in the sequence of 
posterior become smoother and the computational effort is further reduced. 

Since <Jyieid{x) > Vx, we used our model to infer log{(j{x)) i.e. in Equation ( fT2] i. f{x) = 
log{a{x)). The adaptive SMC scheme (Table|2]i with N = 100 or = 500 particles was employed 
in the examples presented with C, — 0.95 and ESSmin ~ N/2. The following values for the 
hyperparameters of the prior model were used (section D.1.2t : 

• kmax = 100 and s = 0.1 (Equation ( fT6] l) 



• = 1.0 (Equation ([ITj) and = 0.0001 (Equation (fTST i) 
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(a) 2D view (b) 3D view 

Figure 4: Reference ayieid{x) field for Example A 



• ao = 1.0 and Jdq = 1.0 (Equation ( |20] |) 

• a = 2. and 6 = 1. X 10"*^ (Equation ©) 

4.1 Example A 

In this example it was assumed that the yield stress varied as follows (Figure]?]): 

\ogay,M{x) = -1 cxp{-10 x^~2{y- l)^} - 1 cxp{-2 {x - if - 10 y^} (40) 

The nonlinear governing PDEs (Equation ( |39] l) were solved using a 64 x 64 uniform finite element 
mesh with the following boundary conditions: 

• Vx ^ Vy = along a; = 

• Vx = —Vy = 0.001 along x = 1 

The displacements v^, Vy at a regular grid consisting of 72 points with coordinates 
(0.125 i, 0.125 j), for z = 1, . . . , 8 and j ~ 0, . . . ,8 were recorded resulting in n = 144 data points 
(as in Figure ]4li. The empirical mean (of the absolute values) of these observations was calcu- 
lated and the recorded values were contaminated by Gaussian noise of standard deviation 5% in 
order to obtain sets of observables denoted by {j/ijf^i in our Bayesian model (Equation Q). We 
note that in this example the scale of variability of the unknown field ayieid{x) is larger than the 
scale of observations, i.e. the grid size where displacements were recorded. 

Table]3]reports the number of degrees of freedom per solver and the normalized computational time 
for a single run w.rt. the 64 x 64 solver As mentioned earlier, each finite element was assigned 
a constant yield stress equal to the average value inside the element. This is of course inconsistent 
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Solver 
Resolution 


Freedom 


^! r\i'Tn 11 li 7f^H 1 r\iTir\iitiitir\n5il 
llldllZjCU v^UlllJJULclLUjIIcil 

Time (Actual in sec) 


16 X 16 


510 


ik (0-55) 


32 X 32 


2,046 


TE (4-8) 


64 X 64 


8,190 


1(86) 



Table 3: Computational cost of different resolution solvers for Example A 




with the governing PDEs as the geometry of the variability plays a critical role for the effective 
properties of each element. It is easily understood though that the corresponding posterior should 
have some similarities arising from the mere nature of their construction. 

At first, we attempted to solve the problem by operating solely on the finest solver Using the Adap- 
tive SMC scheme proposed with N = 100 particles, this resulted in a sequence of 163 (between the 
prior ttq and the target posterior) auxiliary bridging distributions constructed as mentioned earlier 
The inferred field (posterior mean and quantiles) are depicted in Figure|5] Even though they exhibit 
similarities with the ground truth (Figure |4]l, there are also considerable differences which suggest 
that the algorithm probably got trapped in some mode of the posterior. This is to be expected due to 
the highly nonlinear nature of the forward solver and the large state space. It is possible however that 
the correct solution could be recovered if the size of the population and/or the number of bridging 
distributions is increased. Inspite of that, it is the significant computational effort that makes such 
an approach impractical. In particular 16, 300 (i.e. 163 x 100) calls to the most expensive forward 
solver were required. 
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Solver 
Resolution 


Number of Bridging 
Distributions 


Computational Effort 
(w.r.t. calls to 64 x 64 solver) 


16 X 16 


176 


113 


32 X 32 


73 


452 


64 X 64 


54 


5,700 


Total 




6,265 



Table 4: Computational cost for inferences for Example A. Note that the effective cost when using 
only the 64 x 64 solver was 16, 300 

In contrast, when a sequence of 3 solvers was used the results obtained are significantly closer to 
the ground truth as it can be seen in Figures |6] and 7. It is observed that even using the coarsest 
solver (16 x 16), we are able to correctly identify some of the basic features of the underlying 
field. The inferences are greatly improved as solvers at finer resolutions are invoked. Figure 8 
depicts the number of bridging distributions needed at each resolution and the respective reciprocal 
temperatures 7^ (Equation dZTli). These were automatically determined by the proposed Adaptive 
SMC with N = 100 particles. It is also observed that the number of intermediate distributions 
needed decreased as finer resolution solvers are used. This is a direct consequence of the ability 
of the proposed scheme to accumulate information from coarser scale solver These results are 
summarized in Table H] which also reports the effective computational cost at the various stages and 
in total. It can be seen that a reduction of the total number of calls is achieved (16, 300 vs. 6,265). 

Figure |9] depicts the posterior densities of the inferred model error standard deviations (Jr described 
in Equation It is readily seen that the proposed technique is able to quantify the magnitude of 
the model error for solvers of various resolutions. Furthermore for the reference resolution 64 x 64 
it correctly detects that the error contamination is of the level of 5%/iA- Finally Figure [TOl depicts 
the marginal posterior on k , i.e. the cardinality of the expansion at various resolutions. It should 
be noted that the method leads to sparse representations (on average A: = 5 and therefore only 
21 parameters are needed) without sacrificing the accuracy. Traditional formulations (deterministic 
or probabihstic) usually have as many unknowns as elements (i.e. in the 64 x 64 mesh, 4, 096 
parameters) and therefore require operations in very high dimensional spaces with all the negative 
implications this carries. 

5 Conclusions 

A general Bayesian framework has been presented for the identification of spatially varying model 
parameters. The proposed model utilizes a parsimonious, non-parametric formulation that favors 
sparse representations and whose complexity can be determined from the data. An efficient infer- 
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(c) Resolution 64 x 64 
Figure 7: Posterior mean at various solver resolutions for Example A 
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Figure 8: Evolution of reciprocal temperature 7., (Equation ( IZTl l) and number of bridging distribu- 
tions 
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Figure 9: Posterior densities of model error st. deviations fj^ as in Equation The values on 
a;-axis have been divided by jjA 
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Figure 10; Posterior for the cardinality k of the field representation 



ence scheme based on SMC has been discussed which is embarrassingly parallehzable and well- 
suited for detecting muhi-modal posterior distributions. They key element is the introduction of an 
appropriate sequence of posteriors based on a natural hierarchy introduced by various forward solver 
resolutions. As a result, inexpensive, coarse solvers are used to identify the most salient features of 
the unknown field(s) which are subsequently enriched by invoking solvers operating at finer resolu- 
tions. The overall computational cost is further reduced by employing a novel adaptive scheme that 
automatically determines the number of intermediate steps. The proposed methodology does not re- 
quire that Markov Chains using all the solvers to be run simultaneously as in other multi-resolution 
formulations ( lISTl ) . The particulate approximations provide a concise way of representing the pos- 
terior which can be readily updated if the analyst wants to employ forward models operating at even 
finer resolutions or in general more accurate solvers. The output of the inference algorithm provides 
estimates of the model error or noise contained in the data. An important feature is the ability to 
readily provide not only predictive estimates but also quantitative measures of the predictive uncer- 
tainty. Hence it offers a seamless link between data, computational models and predictions. The 
efficiency of the sampling schemes proposed could be greatly improved if the proposed moves in- 
corporate information about the governing PDEs and if upscaling relations are available. A feature 
that was not explored in the examples presented is the possibility of performing adaptive refine- 
ment, not for the purposes of improving the forward solver accuracy but rather for increasing the 
resolution of the unknown fields. This can be achieved in two ways and is a direct consequence of 
the ability of the proposed model (and Bayuesian models in general) to produce credible intervals 
for the estimates made at each step. Hence in regions where the variance of the estimates (or some 
other measure of random variability) is high, the resolution of the forward solver can be increased. 
Furthermore, additional measurements/data can be obtained at these regions if such a possibility 
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exists. Hence the proposed framework allows for near-optimal use of the computational resources 
and sensors available. 
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